Performance Evaluation of Breadth-First Search on Intel Xeon Phi
نویسندگان
چکیده
Breadth-First Search (BFS) is one of the most important kernels in graph computing. It is the main kernel of the Graph500 rating that evaluates performance of large supercomputers and multiprocessor nodes in terms of traversed edges per second (TEPS). In this paper we present the results of BFS performance evaluation on a recently released high-performance Intel Xeon Phi coprocessor. We examine previously proposed Queue-based and Read-based approaches to BFS implementation. We also apply several optimization techniques, such as manual loop unrolling and prefetching, that significantly improve performance on Intel Xeon Phi. On a representative graph set Intel Xeon Phi 7120P demonstrates 178 % maximal and 137 % average speedup as compared to the Intel Xeon E5-2660 processor. We achieved 4366 MTEPS on Intel Xeon Phi 7120P for the graph with scale 25 and have the 89th place on the November 2013 Graph500 list. This is the fourth place among research teams in the class of single node x86-based systems.
منابع مشابه
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multiand many-core processors are lacking and a detailed analysis of their performance under various use cas...
متن کاملPerformance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi
In 2013 Intel introduced the Xeon Phi, a new parallel coprocessor board. The Xeon Phi is a cache-coherent manycore shared memory architecture claiming CPU-like versatility, programmability, high performance, and power efficiency. The first published micro-benchmark studies indicate that many of Intel’s claims appear to be true. The current paper is the first study on the Phi of a complex artifi...
متن کاملPerformance Analysis of an Astrophysical Simulation Code on the Intel Xeon Phi Architecture
We have developed the astrophysical simulation code XFLAT to study neutrino oscillations in supernovae. XFLAT is designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both CPU and Xeon Phi co-processors based on the Intel Many Integrated Core Architecture (MIC). We analyze the performance of XFLAT on configurations with CPU ...
متن کاملOptimization and Scaling of Multiple Sequence Alignment Software ClustalW on Intel Xeon Phi
This work is aimed to investigate and to improve the performance of multiple sequence alignment software ClustalW on the test platform EURORA at CINECA, for the case study of the influenza virus sequences. The objective is code optimization, porting, scaling and performance evaluation of parallel multiple sequence alignment software ClustalW for Intel Xeon Phi (the MIC architecture). For this p...
متن کاملFirst Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC
Recent innovations focused around parallel processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA’s Tesla Graphic...
متن کامل